Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger

نویسندگان

  • Changchun Bao
  • Weiqun Xu
  • Yonghong Yan
چکیده

Named Entity Recognition (NER) is an important task in information extraction, where major attention has been paid to written texts of a news or academic paper (esp. biomedical) style. In this paper we report the first piece of work on NER in spoken Chinese dialogues, as a preliminary step for spoken language understanding. The NER task is taken as a sequential classification problem and solved with a character-level maximum entropy (maxent) tagger. Despite that spoken data seems noisier than written data, with a set of carefully selected features, the maxent tagger achieves an overall F1 score of 91.87 on our dialogue data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named entity extraction from Japanese broadcast news

This paper describes a method for named entity extraction from Japanese broadcast news. Our proposed named entity tagger gives entity categories for every character in order to deal with unknown words and entities correctly. This character-based tagger has models designed by maximum entropy modeling. We discuss the efficiency of the proposed tagger by comparison with a conventional word-based t...

متن کامل

Chinese Character-based Segmentation & POS-tagging and Named Entity Identification with a CRF Chunker

In this paper, we propose a character-based conditional random field (CRF) chunker to identify Chinese named entity words in the text files. The input for it is from a character-based tagger in which the segmentation and partof-speech (POS) tagging are conducted simultanueously. The character-based tagger is trained by using a corpus in which each character is tagged with both its position (POC...

متن کامل

Language Independent NER using a Maximum Entropy Tagger

Named Entity Recognition (NER) systems need to integrate a wide variety of information for optimal performance. This paper demonstrates that a maximum entropy tagger can effectively encode such information and identify named entities with very high accuracy. The tagger uses features which can be obtained for a variety of languages and works effectively not only for English, but also for other l...

متن کامل

Using Embeddings for Both Entity Recognition and Linking in Tweet

English. The paper describes our submissions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both character-level and word-level embeddings. Character-based embeddings allow learning the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger al...

متن کامل

OOV Sensitive Named-Entity Recognition in Speech

Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008